{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook Setup \n",
    "The following cell will install Drake, checkout the underactuated repository, and set up the path (only if necessary).\n",
    "- On Google's Colaboratory, this **will take approximately two minutes** on the first time it runs (to provision the machine), but should only need to reinstall once every 12 hours.  Colab will ask you to \"Reset all runtimes\"; say no to save yourself the reinstall.\n",
    "- On Binder, the machines should already be provisioned by the time you can run this; it should return (almost) instantly.\n",
    "\n",
    "More details are available [here](http://underactuated.mit.edu/drake.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "  import pydrake\n",
    "  import underactuated\n",
    "except ImportError:\n",
    "  !curl -s https://raw.githubusercontent.com/RussTedrake/underactuated/master/scripts/setup/jupyter_setup.py > jupyter_setup.py\n",
    "  from jupyter_setup import setup_underactuated\n",
    "  setup_underactuated()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Grid World\n",
    "\n",
    "The setup here is *almost* identical as the simplest version described in the notes.  The only difference is that this agent is allowed to move diagonally in a single step; this is slightly easier to code since I can have two actions (one for left/right, and another for up/down), and write the dynamics as the trivial linear system ${\\bf x}[n+1] = {\\bf u}[n].$  Only the value iteration code needs to know that the states and actions are actually restricted to the integers.\n",
    "\n",
    "The obstacle (pit of despair) is provided by the method below.  Play around with it!  The rest of the code is mostly to support visualization.\n",
    "\n",
    "TODO: Pull a few more of the visualization frills from my old [code](https://github.com/RobotLocomotion/drake/blob/last_sha_with_original_matlab/drake/examples/GridWorld.m).  At very least, I want to draw the vector field of the resulting policy.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib import cm\n",
    "\n",
    "from pydrake.all import DiagramBuilder, LinearSystem, Simulator\n",
    "from pydrake.systems.controllers import (DynamicProgrammingOptions,\n",
    "                                         FittedValueIteration)\n",
    "\n",
    "from underactuated.jupyter import SetupMatplotlibBackend\n",
    "plt_is_interactive = SetupMatplotlibBackend()\n",
    "\n",
    "\n",
    "time_step = 1\n",
    "# TODO(russt): Support discrete-time systems in the dynamic programming code, and use this properly.\n",
    "#plant = LinearSystem(A=np.eye(2), B=np.eye(2), C=np.eye(2), D=np.zeros((2,2)), time_period=time_step)\n",
    "# for now, just cheat because I know how to make the discrete system as a continuous that will be discretized.\n",
    "plant = LinearSystem(A=np.zeros((2,2)), B=np.eye(2), C=np.eye(2), D=np.zeros((2,2)))\n",
    "simulator = Simulator(plant)\n",
    "options = DynamicProgrammingOptions()\n",
    "\n",
    "xbins = range(0, 21)\n",
    "ybins = range(0, 21)\n",
    "state_grid = [set(xbins), set(ybins)]\n",
    "\n",
    "input_grid = [set([-1, 0, 1]), set([-1, 0, 1])]\n",
    "\n",
    "goal = [2, 8]\n",
    "\n",
    "def obstacle(x):\n",
    "    return x[0]>=6 and x[0]<=8 and x[1]>=4 and x[1]<=7\n",
    "\n",
    "[X, Y] = np.meshgrid(xbins, ybins)\n",
    "\n",
    "def draw(iteration, mesh, cost_to_go, policy):\n",
    "    plt.title(\"iteration \" + str(iteration))\n",
    "    J = np.reshape(cost_to_go, X.shape)\n",
    "    im = ax.imshow(J, cmap=cm.jet)\n",
    "    fig.canvas.draw()\n",
    "    im.remove()\n",
    "\n",
    "    # Slow down so you can watch the iterations of the algorithm.\n",
    "    plt.pause(0.5)  \n",
    "\n",
    "if plt_is_interactive:\n",
    "  options.visualization_callback = draw\n",
    "\n",
    "def solve():\n",
    "    policy, cost_to_go = FittedValueIteration(simulator, cost_function, state_grid,\n",
    "                                              input_grid, time_step, options)\n",
    "\n",
    "    J = np.reshape(cost_to_go, X.shape)\n",
    "    im = ax.imshow(J, cmap=cm.jet)\n",
    "    plt.colorbar(im)\n",
    "    return policy\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's define a minimum-time cost to the goal, and run value iteration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def min_time_cost(context):\n",
    "    x = context.get_continuous_state_vector().CopyToVector()\n",
    "    x = np.round(x)\n",
    "    if obstacle(x):\n",
    "        return 10\n",
    "    if np.array_equal(x, goal):\n",
    "        return 0\n",
    "    return 1\n",
    "    \n",
    "cost_function = min_time_cost\n",
    "options.convergence_tol = .1;\n",
    "\n",
    "fig = plt.figure(figsize=(9, 4))\n",
    "ax = fig.gca()\n",
    "ax.set_xlabel(\"x\")\n",
    "ax.set_ylabel(\"y\")\n",
    "ax.set_title(\"Cost-to-Go\")\n",
    "\n",
    "policy = solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your turn.  Change the cost.  Change the obstacles."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
